Acoustic Model Training with Detecting Transcription Errors in the Training Data

نویسندگان

  • Gakuto Kurata
  • Nobuyasu Itoh
  • Masafumi Nishimura
چکیده

As the target of Automatic Speech Recognition (ASR) has moved from clean read speech to spontaneous conversational speech, we need to prepare orthographic transcripts of spontaneous conversational speech to train acoustic models (AMs). However, it is expensive and slow to manually transcribe such speech word by word. We propose a framework to train an AM based on easy-to-make rough transcripts in which fillers and small word fragments are not precisely transcribed and some transcription errors are included. By focusing on the phone duration in the result of forced alignment between the rough transcripts and the utterances, we can automatically detect the erroneous parts in the rough transcripts. A preliminary experiment showed that we can detect the erroneous parts with moderately high recall and precision. Through ASR experiments with conversational telephone speech, we confirmed that automatic detection helped improve the performance of the AM trained with both conventional ML criteria and state-of-the-art boosted MMI criteria.

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

منابع مشابه

Automatic verification of broadcast news transcriptions

In this paper we present a method for automatically detecting erroneous training scripts for speech corpora like Broadcast News and Switchboard. Based on the Hub-4 task we will report on the performance of error detection with the proposed method and investigate the effects of both manually and automatically cleaned training corpora on the performance of the RWTH speech recognition system. Our ...

متن کامل

Detecting Depression in Elderly People by Using Artificial Neural Network

Introduction: The possibility of depression is common in the elderly. Novel technologies allow us to monitor people related to depression. Hence, a model was provided to detect depression in elderly based on artificial neural network (ANN). Methods: The present study is an applied descriptive-survey research. Forty elderly people were randomly selected from the Elderly Care Center in Gonbad Ka...

متن کامل

Selective training of HMMs by using two-stage clustering

This paper proposes a method of constructing acoustic models from training data clustered in two stages. In the first stage, training data from a target task are clustered and generate GMMs for each cluster. The second stage uses the GMMs to select training data from a large-scale database based on the GMM likelihood. MAP estimation adapts an acoustic model for each cluster using the selected t...

متن کامل

INVESTIGATING THE EFFECT OF TEACHING SBAR COMMUNICATION MODEL ON THE FREQUENCY OF MEDICATION ERRORS AMONG NURSES WORKING IN RAZI PSYCHIATRIC TRAINING AND TREATMENT CENTER IN URMIA IN 2018

Background & Aims: Mistakes in psychiatric admissions are due to the greater turnover of patients and nursing staff and frequent changes in prescriptions, and drug use often occurs in noisy states and during meals. The SBAR (Status, Background, Assessment, and Recommendation) model is a highly effective tool that provides a predictable, common structure for communication. The purpose of this st...

متن کامل

Automatic Lecture Transcription Based on Discriminative Data Selection for Lightly Supervised Acoustic Model Training

The paper addresses a scheme of lightly supervised training of an acoustic model, which exploits a large amount of data with closed caption texts but not faithful transcripts. In the proposed scheme, a sequence of the closed caption text and that of the ASR hypothesis by the baseline system are aligned. Then, a set of dedicated classifiers is designed and trained to select the correct one among...

متن کامل

ذخیره در منابع من


  با ذخیره ی این منبع در منابع من، دسترسی به آن را برای استفاده های بعدی آسان تر کنید

برای دانلود متن کامل این مقاله و بیش از 32 میلیون مقاله دیگر ابتدا ثبت نام کنید

ثبت نام

اگر عضو سایت هستید لطفا وارد حساب کاربری خود شوید

عنوان ژورنال:

دوره   شماره 

صفحات  -

تاریخ انتشار 2011